Search CORE

183 research outputs found

A New Approach to Clustering Biological Data Using Message Passing.

Author: Geng Huimin
Publication venue: DigitalCommons@UNO
Publication date: 01/09/2001
Field of study

Motivation: Clustering algorithms are widely used m bioinformatics, having been applied to a range of problems from the analysis of gene expression to the building of phylogenetic trees. Biological data often describe parallel and spontaneous processes such as molecular interactions and genome evolution. To capture these features, we propose a new clustering algorithm that employs the concept of message passing. Methods: Inspired by a real-world situation in which people who have never met can form groups by exchanging messages, Message Passing Clustering (MPC) allows data objects to communicate with each other and produces clusters in parallel, thereby making the clustering process intrinsic. Other advantages of MPC over traditional clustering methods include that it is relatively straightforward to understand and implement and that it takes into account both local and global structure. We have proved that MPC shares similarity with Hierarchical Clustering (HC) but offers significantly improved performance. Results: To validate the MPC method, we analyzed 35 sets of simulated dynamic gene expression data, achieving a 95% hit rate with 639 of 674 genes correctly clustered. We also applied MPC to real data sets to build a phylogenetic tree for 34 strains from nine species of Mycobacterium and to cluster 698 genes from a yeast cell-cycle database. The results show higher classification accuracies as compared to traditional clustering methods

The University of Nebraska, Omaha

Virtual CGH: an integrative approach to predict genetic abnormalities from gene expression microarray data applied in lymphoma

Author: Ali Hesham
Geng Huimin
Iqbal Javeed
Publication venue: DigitalCommons@UNO
Publication date: 01/01/2011
Field of study

Background: Comparative Genomic Hybridization (CGH) is a molecular approach for detecting DNA Copy Number Alterations (CNAs) in tumor, which are among the key causes of tumorigenesis. However in the post-genomic era, most studies in cancer biology have been focusing on Gene Expression Profiling (GEP) but not CGH, and as a result, an enormous amount of GEP data had been accumulated in public databases for a wide variety of tumor types. We exploited this resource of GEP data to define possible recurrent CNAs in tumor. In addition, the CNAs identified by GEP would be more functionally relevant CNAs in the disease pathogenesis since the functional effects of CNAs can be reflected by altered gene expression. Methods: We proposed a novel computational approach, coined virtual CGH (vCGH), which employs hidden Markov models (HMMs) to predict DNA CNAs from their corresponding GEP data. vCGH was first trained on the paired GEP and CGH data generated from a sufficient number of tumor samples, and then applied to the GEP data of a new tumor sample to predict its CNAs. Results: Using cross-validation on 190 Diffuse Large B-Cell Lymphomas (DLBCL), vCGH achieved 80% sensitivity, 90% specificity and 90% accuracy for CNA prediction. The majority of the recurrent regions defined by vCGH are concordant with the experimental CGH, including gains of 1q, 2p16-p14, 3q27-q29, 6p25-p21, 7, 11q, 12 and 18q21, and losses of 6q, 8p23-p21, 9p24-p21 and 17p13 in DLBCL. In addition, vCGH predicted some recurrent functional abnormalities which were not observed in CGH, including gains of 1p, 2q and 6q and losses of 1q, 6p and 8q. Among those novel loci, 1q, 6q and 8q were significantly associated with the clinical outcomes in the DLBCL patients (p \u3c 0.05). Conclusions: We developed a novel computational approach, vCGH, to predict genome-wide genetic abnormalities from GEP data in lymphomas. vCGH can be generally applied to other types of tumors and may significantly enhance the detection of functionally important genetic abnormalities in cancer research

The University of Nebraska, Omaha

A Dynamic Bayesian Network Model for Hierarchial Classification and its Application in Predicting Yeast Genes Functions

Author: Ali Hesham H.
Deng Xutao
Geng Huimin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2005
Field of study

In this paper, we propose a Dynamic Naive Bayesian (DNB) network model for classifying data sets with hierarchical labels. The DNB model is built upon a Naive Bayesian (NB) network, a successful classifier for data with flattened (nonhierarchical) class labels. The problems using flattened class labels for hierarchical classification are addressed in this paper. The DNB has a top-down structure with each level of the class hierarchy modeled as a random variable. We defined augmenting operations to transform class hierarchy into a form that satisfies the probability law. We present algorithms for efficient learning and inference with the DNB model. The learning algorithm can be used to estimate the parameters of the network. The inference algorithm is designed to find the optimal classification path in the class hierarchy. The methods are tested on yeast gene expression data sets, and the classification accuracy with DNB classifier is significantly higher than it is with previous approaches– flattened classification using NB classifier

AIS Electronic Library (AISeL)

Message Passing Clustering with Stochastic Merging Based on Kernel Functions

Author: Ali Hesham H.
Deng Xutao
Geng Huimin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2005
Field of study

In this paper, we propose a new Stochastic Message Passing Clustering (SMPC) algorithm for clustering biological data based on the Message Passing Clustering (MPC) algorithm, which we introduced in earlier work. MPC has shown its advantage when applied to describing parallel and spontaneous biological processes. SMPC, as a generalized version of MPC, extends the clustering algorithm from a deterministic process to a stochastic process, adding three major advantages. First, in deciding the merging cluster pair, the influences of all clusters are quantified by probabilities, estimated by kernel functions based on their relative distances. Second, the proposed algorithm property resolve the “tie” problem, which often occurs for integer distances as in the case of protein interaction data. Third, clustering can be undone to improve the clustering performance when the algorithm detects objects which don’t have good probabilities inside the cluster and moves them outside. The test results on colon cancer gene-expression data show that SMPC performs better than the deterministic MPC

AIS Electronic Library (AISeL)

Applications of Hidden Markov Models in Microarray Gene Expression Data

Author: Hesham H Ali
Huimin Geng
Xutao Deng
Publication venue: 'IntechOpen'
Publication date: 19/04/2011
Field of study

Hidden Markov models (HMMs) are well developed statistical models to capture hidden information from observable sequential symbols. They were first used in speech recognition in 1970s and have been successfully applied to the analysis of biological sequences since late 1980s as in finding protein secondary structure, CpG islands and families of related DNA or protein sequences [1]. In a HMM, the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. In this chapter, we described two applications using HMMs to predict gene functions in yeast and DNA copy number alternations in human tumor cells, based on gene expression microarray data

IntechOpen

The University of Nebraska, Omaha

Dynamics of asynchronous random Boolean networks with asynchrony generated by stochastic processes

Author: Deng Xutao
Geng Huimin
Matache Mihaela Teodora
Publication venue: DigitalCommons@UNO
Publication date: 01/03/2007
Field of study

An asynchronous Boolean network with N nodes whose states at each time point are determined by certain parent nodes is considered. We make use of the models developed by Matache and Heidel [Matache, M.T., Heidel, J., 2005. Asynchronous random Boolean network model based on elementary cellular automata rule 126. Phys. Rev. E 71, 026232] for a constant number of parents, and Matache [Matache, M.T., 2006. Asynchronous random Boolean network model with variable number of parents based on elementary cellular automata rule 126. IJMPB 20 (8), 897–923] for a varying number of parents. In both these papers the authors consider an asynchronous updating of all nodes, with asynchrony generated by various random distributions. We supplement those results by using various stochastic processes as generators for the number of nodes to be updated at each time point. In this paper we use the following stochastic processes: Poisson process, random walk, birth and death process, Brownian motion, and fractional Brownian motion. We study the dynamics of the model through sensitivity of the orbits to initial values, bifurcation diagrams, and fixed-point analysis. The dynamics of the system show that the number of nodes to be updated at each time point is of great importance, especially for the random walk, the birth and death, and the Brownian motion processes. Small or moderate values for the number of updated nodes generate order, while large values may generate chaos depending on the underlying parameters. The Poisson process generates order. With fractional Brownian motion, as the values of the Hurst parameter increase, the system exhibits order for a wider range of combinations of the underlying parameters

The University of Nebraska, Omaha

Cross-platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data

Author: Ali Hesham H.
Deng Xutao
Geng Huimin
Publication venue: Libertas Academica
Publication date: 01/01/2007
Field of study

Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies

Directory of Open Access Journals

PubMed Central

Virtual CGH: an integrative approach to predict genetic abnormalities from gene expression microarray data applied in lymphoma

Author: Ali Hesham H
Chan Wing C
Geng Huimin
Iqbal Javeed
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Comparative Genomic Hybridization (CGH) is a molecular approach for detecting DNA Copy Number Alterations (CNAs) in tumor, which are among the key causes of tumorigenesis. However in the post-genomic era, most studies in cancer biology have been focusing on Gene Expression Profiling (GEP) but not CGH, and as a result, an enormous amount of GEP data had been accumulated in public databases for a wide variety of tumor types. We exploited this resource of GEP data to define possible recurrent CNAs in tumor. In addition, the CNAs identified by GEP would be more functionally relevant CNAs in the disease pathogenesis since the functional effects of CNAs can be reflected by altered gene expression. Methods We proposed a novel computational approach, coined virtual CGH (vCGH), which employs hidden Markov models (HMMs) to predict DNA CNAs from their corresponding GEP data. vCGH was first trained on the paired GEP and CGH data generated from a sufficient number of tumor samples, and then applied to the GEP data of a new tumor sample to predict its CNAs. Results Using cross-validation on 190 Diffuse Large B-Cell Lymphomas (DLBCL), vCGH achieved 80% sensitivity, 90% specificity and 90% accuracy for CNA prediction. The majority of the recurrent regions defined by vCGH are concordant with the experimental CGH, including gains of 1q, 2p16-p14, 3q27-q29, 6p25-p21, 7, 11q, 12 and 18q21, and losses of 6q, 8p23-p21, 9p24-p21 and 17p13 in DLBCL. In addition, vCGH predicted some recurrent functional abnormalities which were not observed in CGH, including gains of 1p, 2q and 6q and losses of 1q, 6p and 8q. Among those novel loci, 1q, 6q and 8q were significantly associated with the clinical outcomes in the DLBCL patients (p < 0.05). Conclusions We developed a novel computational approach, vCGH, to predict genome-wide genetic abnormalities from GEP data in lymphomas. vCGH can be generally applied to other types of tumors and may significantly enhance the detection of functionally important genetic abnormalities in cancer research.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Highly multiplexed and quantitative cell-surface protein profiling using genetically barcoded antibodies.

Author: Adams Jarrett J
Geng Huimin
Hornsby Michael
Hu Amy
Julien Olivier
Martinko Alexander J
Moffat Jason
Mou Yun
Müschen Markus
Ploder Lynda
Pollock Samuel B
Sidhu Sachdev S
Wells James A
Publication venue: eScholarship, University of California
Publication date: 01/03/2018
Field of study

Human cells express thousands of different surface proteins that can be used for cell classification, or to distinguish healthy and disease conditions. A method capable of profiling a substantial fraction of the surface proteome simultaneously and inexpensively would enable more accurate and complete classification of cell states. We present a highly multiplexed and quantitative surface proteomic method using genetically barcoded antibodies called phage-antibody next-generation sequencing (PhaNGS). Using 144 preselected antibodies displayed on filamentous phage (Fab-phage) against 44 receptor targets, we assess changes in B cell surface proteins after the development of drug resistance in a patient with acute lymphoblastic leukemia (ALL) and in adaptation to oncogene expression in a Myc-inducible Burkitt lymphoma model. We further show PhaNGS can be applied at the single-cell level. Our results reveal that a common set of proteins including FLT3, NCR3LG1, and ROR1 dominate the response to similar oncogenic perturbations in B cells. Linking high-affinity, selective, genetically encoded binders to NGS enables direct and highly multiplexed protein detection, comparable to RNA-sequencing for mRNA. PhaNGS has the potential to profile a substantial fraction of the surface proteome simultaneously and inexpensively to enable more accurate and complete classification of cell states

eScholarship - University of California

Genome wide transcriptional analysis of resting and IL2 activated human natural killer cells: gene expression signatures indicative of novel molecular signaling pathways

Author: Chan Wing C
d'Amore Francesco
Dybkaer Karen
Geng Huimin
Iqbal Javeed
Schmitz Alexander
Xiao Li
Zhou Guimei
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Human natural killer (NK) cells are the key contributors of innate immune response and the effector functions of these cells are enhanced by cytokines such as interleukine 2 (IL2). We utilized genome-wide transcriptional profiling to identify gene expression signatures and pathways in resting and IL2 activated NK cell isolated from peripheral blood of healthy donors. Results Gene expression profiling of resting NK cells showed high expression of a number of cytotoxic factors, cytokines, chemokines and inhibitory and activating surface NK receptors. Resting NK cells expressed many genes associated with cellular quiescence and also appeared to have an active TGFβ (TGFB1) signaling pathway. IL2 stimulation induced rapid downregulation of quiescence associated genes and upregulation of genes associated with cell cycle progression and proliferation. Numerous genes that may enhance immune function and responsiveness including activating receptors (<it>DNAM1, KLRC1 </it>and <it>KLRC3</it>), death receptor ligand (<it>TNFSF6 (FASL</it>) and <it>TRAIL</it>), chemokine receptors (<it>CX3CR1, CCR5 </it>and <it>CCR7</it>), interleukin receptors (<it>IL2RG, IL18RAB </it>and <it>IL27RA</it>) and members of secretory pathways (<it>DEGS1, FKBP11, SSR3, SEC61G </it>and <it>SLC3A2</it>) were upregulated. The expression profile suggested PI3K/AKT activation and NF-κB activation through multiple pathways (TLR/IL1R, TNF receptor induced and TCR-like possibly involving BCL10). Activation of NFAT signaling was supported by increased expression of many pathway members and downstream target genes. The transcription factor <it>GATA3 </it>was expressed in resting cells while <it>T-BET </it>was upregulated on activation concurrent with the change in cytokine expression profile. The importance of NK cells in innate immune response was also reflected by late increased expression of inflammatory chemotactic factors and receptors and molecules involved in adhesion and lymphocyte trafficking or migration. Conclusion This analysis allowed us to identify genes implicated in cellular quiescence and the cytokines and cytotoxic factors ready for immediate immune response. It also allowed us to observe the sequential immunostimulatory effects of IL2 on NK cells improving our understanding of the biology and molecular mediators behind NK cell activation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

VBN